Failure by Design: Influence of the RTOS Interface on Memory Fault Resilience
نویسندگان
چکیده
Soft errors are emerging with the ongoing reduction of structure sizes in current and future hardware designs. This problematic is generally tackled by employing fault detection or tolerance measures from an applications’ point of view. At the same time, research commences to harden the operating system, often considered as remaining single point of failure. Certainly, these measures can effectively treat the symptoms of hardware faults. However, we argue that the operating system design per se can offer an intrinsic resilience against errors. Dynamic operating system designs, often resembling Unix-like interfaces, are obliged to cope with pointers and list-based data structures to provide the demanded flexibility. In contrast, especially in the domain of embedded systems this flexibility is often not needed. Here, static system designs can be deployed, which allow to avoid error-prone pointer-based memory operations. We believe, that a fully static system design can enhance the resilience against memory errors solely by reduced memory consumption and inherently more robust data structures. This paper studies the influences of memory faults on both, a dynamic and a fully static embedded operating system. Extensive injection campaigns, covering the entire fault space within the kernel data structures, will show that even when applying hardware-based fault detection mechanisms to a dynamic kernel, a static kernel design is still more than 75 percent less susceptible to silent data corruptions.
منابع مشابه
Evaluating Multipath TCP Resilience against Link Failures
Standard TCP is the de facto reliable transfer protocol for the Internet. It is designed to establish a reliable connection using only a single network interface. However, standard TCP with single interfacing performs poorly due to intermittent node connectivity. This requires the re-establishment of connections as the IP addresses change. Multi-path TCP (MPTCP) has emerged to utilize multiple ...
متن کاملA Non-Intrusive SWIFI technique for RTOS robustness testing1
Systems-on-a-chip (SoCs) technologies offer a great promise by shrinking an entire computer to a single chip. Real-time operating systems (RTOS) enable SoC applications to cope with more sophisticated control tasks. These operating systems are supposed to guarantee a correct service despite the existence of design application faults. Robustness can be defined as the capacity of a system to oper...
متن کاملImproving FPGA resilience through Partial Dynamic Reconfiguration
This paper explores advances in reconfiguration properties of SRAM-based FPGAs, namely Partial Dynamic Reconfiguration, to improve the resilience of critical systems that take advantage of this technology. Commercial of-the-shelf stateof-the-art FPGA devices use SRAM cells for the configuration memory, which allow an increase in both performance and capacity. The fast access times and unlimited...
متن کاملSHIELD: A Fault-Tolerant MPI for an Infiniband Cluster
Today’s high performance cluster computing technologies demand extreme robustness against unexpected failures to finish aggressively parallelized work in a given time constraint. Although there has been a steady effort in developing hardware and software tools to increase fault-resilience of cluster environments, a successful solution has yet to be delivered to commercial vendors. This paper pr...
متن کاملCAFT: Cost-aware and Fault-tolerant routing algorithm in 2D mesh Network-on-Chip
By increasing, the complexity of chips and the need to integrating more components into a chip has made network –on- chip known as an important infrastructure for network communications on the system, and is a good alternative to traditional ways and using the bus. By increasing the density of chips, the possibility of failure in the chip network increases and providing correction and fault tol...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013